A New Textual/Non-Textual Classifier for Document Skew Correction
نویسندگان
چکیده
This work is supported by National Nature Science Foundation of China (69982005) and Projects of Development Plan of the State Key Foundation Search (G199803050703) Abstract: A robust approach is proposed for document skew detection. We use Fourier analysis and SVM to classify textual areas from non-textual areas of documents. We also propose a robust method to determine the skew angle from textual areas. Our approach achieves good performance on documents with large area of non-textual contents.
منابع مشابه
Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-textual Document Images: A Novel Approach
Skew angle estimation and correction of a document page is an important task for document analysis and optical character recognition (OCR) applications. Many approaches of skew detection can process pure textual document images successfully. But it is a challenging problem to process documents like handwritten, large areas of non-textual contents. In this direction, a novel approach for textual...
متن کاملUsing Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents
Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...
متن کاملSkew Correction of Textural Documents
Two algorithms for accurate skew detection and correction of textual documents are presented. They depend on finding a horizontal RLSA image of the skewed document. The average skew of selected black connected components in the RLSA image is considered as the skew angle for the whole document which is finally rotated in the opposite direction by that amount to obtain the final corrected image. ...
متن کاملUsing Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents
Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...
متن کاملFiducial line based skew estimation
Skew estimation for textual document images is a well-researched topic and numerals of methods have been reported in the literature. One of the major challenges is the presence of interfering non-textual objects of various types and quantities in the document images. Many existing methods require proper separation of the textual objects which are well aligned from the non-textual objects which ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002